For this assignment I have chosen to utilise data available through ‘Inside Airbnb’, to aid individuals who are looking to invest in a property in New York with the view of renting it out through the Airbnb platform. The information in this report would aid in answering questions such as:
Inside Airbnb is an independent, non-commercial set of tools and data that allows you to explore how Airbnb is really being used in cities around the world. By utilsing this data the target audience of this report would be able to utilise key metrics to see how Airbnb is being used to compete with the residential housing market.
Inside Airbnb data is publicly available at Inside Airbnb where comprehensive data on the Airbnb market is available for most cities globally. The figure below provides an overview of the data available in the listings data set for New York city
# Load NY listings csv file
airbnb_description <- read_csv("description.csv", col_names = TRUE)
# Convert to DF
df_airbnb_description <- data.frame(airbnb_description)
df_airbnb_description %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| Field | Description |
|---|---|
| id | Unique ID per listing |
| name | Name and description of listed room |
| host_id | Unique host ID |
| host_name | Host name |
| neighbourhood_group | New York neighborhood grouping name (Bronx, Brooklyn, Manhattan, Queens, Staten Island) |
| neighbourhood | New York sub neighbourhood name |
| latitude | Latitude co-ordinates |
| longitude | Longitude co-ordinates |
| room_type | Room type (Private room, Entire room/apt, Shared room, Hotel room) |
| price | Price per night in US$ |
| minimum_nights | Minimum number of nights to book |
| number_of_reviews | Number of reviews for listing |
| last_review | Date of last review |
| reviews_per_month | Average number of reviews received per month |
| calculated_host_listings_count | Number of listings host of property has in New York |
| availability_365 | Number of days available out of 365 |
# Load airbnb description csv file
ny_airbnb_listings <- read_csv("newyork_airbnb_listings.csv", col_names = TRUE)
# Take a look at the data
str(ny_airbnb_listings)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 48377 obs. of 16 variables:
## $ id : num 3647 3831 5022 5099 5121 ...
## $ name : chr "THE VILLAGE OF HARLEM....NEW YORK !" "Cozy Entire Floor of Brownstone" "Entire Apt: Spacious Studio/Loft by central park" "Large Cozy 1 BR Apartment In Midtown East" ...
## $ host_id : num 4632 4869 7192 7322 7356 ...
## $ host_name : chr "Elisabeth" "LisaRoxanne" "Laura" "Chris" ...
## $ neighbourhood_group : chr "Manhattan" "Brooklyn" "Manhattan" "Manhattan" ...
## $ neighbourhood : chr "Harlem" "Clinton Hill" "East Harlem" "Murray Hill" ...
## $ latitude : num 40.8 40.7 40.8 40.7 40.7 ...
## $ longitude : num -73.9 -74 -73.9 -74 -74 ...
## $ room_type : chr "Private room" "Entire home/apt" "Entire home/apt" "Entire home/apt" ...
## $ price : num 150 89 80 200 60 79 79 116 150 135 ...
## $ minimum_nights : num 3 1 10 3 45 2 2 30 1 5 ...
## $ number_of_reviews : num 0 279 9 75 49 443 118 94 161 54 ...
## $ last_review : Date, format: NA "2019-08-29" ...
## $ reviews_per_month : num NA 4.62 0.1 0.59 0.39 3.51 0.97 0.73 1.32 0.43 ...
## $ calculated_host_listings_count: num 1 1 1 1 1 1 1 1 4 1 ...
## $ availability_365 : num 365 192 0 13 0 246 0 347 0 40 ...
## - attr(*, "spec")=
## .. cols(
## .. id = col_double(),
## .. name = col_character(),
## .. host_id = col_double(),
## .. host_name = col_character(),
## .. neighbourhood_group = col_character(),
## .. neighbourhood = col_character(),
## .. latitude = col_double(),
## .. longitude = col_double(),
## .. room_type = col_character(),
## .. price = col_double(),
## .. minimum_nights = col_double(),
## .. number_of_reviews = col_double(),
## .. last_review = col_date(format = ""),
## .. reviews_per_month = col_double(),
## .. calculated_host_listings_count = col_double(),
## .. availability_365 = col_double()
## .. )
summary(ny_airbnb_listings)
## id name host_id
## Min. : 3647 Length:48377 Min. : 2438
## 1st Qu.: 9699559 Class :character 1st Qu.: 8288419
## Median :20322645 Mode :character Median : 33067672
## Mean :19893435 Mean : 72458150
## 3rd Qu.:30343546 3rd Qu.:117088883
## Max. :38568081 Max. :294184975
##
## host_name neighbourhood_group neighbourhood latitude
## Length:48377 Length:48377 Length:48377 Min. :40.50
## Class :character Class :character Class :character 1st Qu.:40.69
## Mode :character Mode :character Mode :character Median :40.72
## Mean :40.73
## 3rd Qu.:40.76
## Max. :40.92
##
## longitude room_type price minimum_nights
## Min. :-74.24 Length:48377 Min. : 0.0 Min. : 1.000
## 1st Qu.:-73.98 Class :character 1st Qu.: 69.0 1st Qu.: 1.000
## Median :-73.96 Mode :character Median : 105.0 Median : 2.000
## Mean :-73.95 Mean : 152.7 Mean : 7.425
## 3rd Qu.:-73.93 3rd Qu.: 175.0 3rd Qu.: 5.000
## Max. :-73.71 Max. :10000.0 Max. :1250.000
##
## number_of_reviews last_review reviews_per_month
## Min. : 0.00 Min. :2011-03-28 Min. : 0.010
## 1st Qu.: 1.00 1st Qu.:2018-08-24 1st Qu.: 0.190
## Median : 5.00 Median :2019-07-25 Median : 0.730
## Mean : 24.12 Mean :2018-11-30 Mean : 1.385
## 3rd Qu.: 25.00 3rd Qu.:2019-08-29 3rd Qu.: 2.040
## Max. :654.00 Max. :2019-09-12 Max. :67.600
## NA's :9651 NA's :9651
## calculated_host_listings_count availability_365
## Min. : 1.000 Min. : 0.0
## 1st Qu.: 1.000 1st Qu.: 0.0
## Median : 1.000 Median : 47.0
## Mean : 8.153 Mean :114.1
## 3rd Qu.: 2.000 3rd Qu.:252.0
## Max. :387.000 Max. :365.0
##
head(ny_airbnb_listings)
## # A tibble: 6 x 16
## id name host_id host_name neighbourhood_g~ neighbourhood latitude
## <dbl> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 3647 THE ~ 4632 Elisabeth Manhattan Harlem 40.8
## 2 3831 Cozy~ 4869 LisaRoxa~ Brooklyn Clinton Hill 40.7
## 3 5022 Enti~ 7192 Laura Manhattan East Harlem 40.8
## 4 5099 Larg~ 7322 Chris Manhattan Murray Hill 40.7
## 5 5121 Blis~ 7356 Garon Brooklyn Bedford-Stuy~ 40.7
## 6 5178 Larg~ 8967 Shunichi Manhattan Hell's Kitch~ 40.8
## # ... with 9 more variables: longitude <dbl>, room_type <chr>,
## # price <dbl>, minimum_nights <dbl>, number_of_reviews <dbl>,
## # last_review <date>, reviews_per_month <dbl>,
## # calculated_host_listings_count <dbl>, availability_365 <dbl>
# Load windows font calibra
windowsFonts("Calibra" = windowsFont("Calibra"))
# Create RC chart attributes
rc_chartattributes1 <- theme_bw() +
theme(text=element_text(family="Calibra")) +
theme(panel.border = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "gray"),
axis.ticks.x = element_blank(),
axis.ticks.y = element_blank(),
plot.title = element_text(color = "black", size = 28, face = "bold"),
plot.subtitle = element_text(color = "gray45", size = 16),
plot.caption = element_text(color = "gray45", size = 12, face = "italic", hjust = 0),
legend.position="bottom")
# Group data by neighbourhood group and room type
ny_airbnb_listings_room_number <- ny_airbnb_listings %>%
group_by(neighbourhood_group, room_type) %>%
tally
# Number of rooms by type by neighbourhood
bar_chart_nh_room_type <- ggplot(data = ny_airbnb_listings_room_number) +
geom_bar(aes(x = neighbourhood_group, y = n, group = room_type, fill = room_type), stat="identity", alpha = 1) +
labs(title = "New York Airbnb room listings by neighbourhood",
subtitle = "Manhatten has the most rooms listed with c.21,000, with a majority being 'Entire rooms/apts'",
caption = "Source: http://insideairbnb.com/get-the-data.html",
x = "Neighbourhood group",
y = "Number of rooms",
fill = "Room type") +
scale_y_continuous(labels = comma) +
scale_color_manual(values = c("#173F5F", "#3CAEA3", "#F6D55C", "#ED553B")) +
scale_fill_manual(values = c("#173F5F", "#3CAEA3", "#F6D55C", "#ED553B")) +
rc_chartattributes1
bar_chart_nh_room_type
Explanation:
Key Insights:
# Average room price by group_neighbourhood
ny_airbnb_listings_nh_mean <- ny_airbnb_listings %>%
group_by(neighbourhood_group) %>%
summarise(price = round(mean(price), 2))
# Density plot of room price by type by neighbourhood
density_price_nh <- ggplot(data = ny_airbnb_listings) +
geom_density(aes(x = price, color = room_type, fill = room_type), position = "identity", bins = 40, alpha = 0.3) +
labs(title = "Distribution of New York neighbourhood prices by room type",
subtitle = "Manhattan exhibits the highest average price, driven by having a greater mix of 'entire room/apt' type of rooms",
caption = "Source: http://insideairbnb.com/get-the-data.html",
x = "Price (Log10 transformation)",
y = "Density",
color = "Type of room",
fill = "Type of room") +
scale_color_manual(values = c("#173F5F", "#3CAEA3", "#F6D55C", "#ED553B")) +
scale_fill_manual(values = c("#173F5F", "#3CAEA3", "#F6D55C", "#ED553B")) +
scale_x_log10() +
geom_vline(data = ny_airbnb_listings_nh_mean, aes(xintercept = price), linetype="dashed", color = "gray45") +
geom_text(data = ny_airbnb_listings_nh_mean,y = 3, aes(x = price + 1400, label = paste("Mean = ",price)), color = "gray45", size = 4) +
facet_wrap(~neighbourhood_group, nrow=1) +
rc_chartattributes1
density_price_nh
Explanation:
Key Insights:
# Create room type palette
room_type_color <- colorFactor(c("#173F5F", "#3CAEA3", "#F6D55C", "#ED553B"), domain=c("Entire home/apt", "Hotel room", "Private room", "Shared room"))
# Create new price column to show relative sizes in chart
ny_airbnb_listings$price_scaled <- 0.001*(ny_airbnb_listings$price)
# Create map output
newyork_map <- ny_airbnb_listings %>%
leaflet(width = "100%") %>%
addProviderTiles(providers$Stamen.TonerBackground) %>%
setView(-73.96, 40.72, zoom = 11) %>%
addCircleMarkers(~longitude, ~latitude,
popup=paste("Name:", ny_airbnb_listings$name, "<br>",
"Type:", ny_airbnb_listings$room_type, "<br>",
"Price:",ny_airbnb_listings$price),
weight = 1, radius= ~price_scaled,
color=~room_type_color(room_type), stroke = F, fillOpacity = 0.4) %>%
addLegend("bottomright", colors= c("#173F5F", "#3CAEA3", "#F6D55C", "#ED553B"), labels=c("Entire home/apt", "Hotel room", "Private room", "Shared room"), title="Room types")
newyork_map
Explanation:
Key Insights: